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Abstract 

We apply Doeblin's ergodicity coefficient as a computational tool to approximate the occu- 
pancy distribution of a set of states in a homogeneous but possibly non-stationary finite Markov 
chain. Our approximation is based on new properties satisfied by this coefficient, which allow us 
to approximate a chain of duration n by independent and short-lived realizations of an auxiliary 
homogeneous Markov chain of duration of order ln(n). Our approximation may be particularly 
useful when exact calculations via first-step methods or transfer matrices are impractical, and 
asymptotic approximations may not be yet reliable. Our findings may find applications to pat- 
tern problems in Markovian and non-Markovian sequences that are treatable via embedding 
techniques. 

1 Introduction. 

In what follows, S is a given finite set and T C S a certain non-empty subset of states. For a 
fixed integer n > 1, consider a first-order homogeneous Markov chain X = (-Xj)o<t<n with initial 
distribution /i : S — > [0, 1] and probability transition matrix p : S x S — > [0, 1]. We identify complex- 
valued functions defined over S and S x S as row vectors and matrices, respectively. In particular, 
the distribution of X t is given by the vector [ip 1 . 

Our object of interest is the occupancy distribution of T i.e. the distribution of the random 
variable: 

n 

T n = J2lX t G T], (1) 

t=\ 

where [■] denotes the Iverson's bracket. Random variables of this sort are common in assessing 
the frequency statistics of patterns in random sequences, which typically model text or genomic 
sequences. Although various probabilistic and analytic techniques have been used for this purpose, 
the Markov chain embedding technique is among the most versatile ones. This technique seems 
to have originated in the works of Gerber and Li [TT], Biggins and Cannings [I], and Bender and 
Kochman [3], It usually consists in embedding a random sequence into the state space of a suit- 
able finite automaton that is informative of the pattern of interest, and it has been completely 
systematized for regular patterns i.e. patterns described by a regular expression, and Markovian 
models of random sequences [19l [18] . In addition, the technique has also shown some promise for 
assessing regular patterns in non-Markovian sequences i.e. sequences with an arbitrary correlation 
structure [TBI. 
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All the complexity associated with determining or approximating the distribution of T n is due 
to the distributional dependence between the consecutive states visited by the chain X. There are 
various ways — some more ad hoc and others more systematic — to pinpoint this distribution. For 
small values of n, exact calculations are possible via one-step methods [7] or transfer matrices |10) . 
Furthermore, transfer matrices lead to Normal approximations for large values of n e.g. as shown 
in [19] for frequency statistics of regular patterns under Markovian models. On the other hand, for 
stationary chains, Poisson [2] and compound Poisson approximations [5] have been proposed when T 
is a rare set i.e. the stationary measure of T is small. A specialized instance of these approximations 
is the Polya-Aeppli distribution which occurs as the limiting distribution of frequency statistics of 
rare words under stationary Markov models 20. . 

Our main motivation is to approximate the distribution of T n in the intermediate regime where 
n is too large for exact calculations but too small to rely on asymptotic approximations, when X is 
possibly non-stationary. Our approach relies on a novel probabilistic interpretation and use of the 
so called Doeblin's ergodicity coefficient associated with p [5J, which is defined as: 

a (p) = ^2 min p(i, j). (2) 

i 

The above motivation is far from artificial! For instance, extensive research is being performed 
to understand the evolution of complex but short RNA sequences from simpler but functional RNA 
sequences [TH [13]. In contexts like this, the pitfall of the Normal approximation of T n is the slow rate 
of convergence of order nT 1 !" 1 . On the other hand, the stationary assumption of the aforementioned 
Poisson approximations is unrealistic in the context of the Markov chain embedding technique, even 
if the background model of a genomic sequence is Markovian and stationary. For example, for 
regular patterns, the initial distribution of the embedded process is concentrated in a few states (of 
an exponentially large state space) associated with the unique initial state of the (minimal) fc-th 
order automaton that recognizes the pattern of interest [TS] . 



1.1 Ergodicity coefficients of Markov chains. 

This section finishes the Introduction with a brief discussion about ergodicity coefficients and the his- 
torical developments surrounding the characterization of weak-ergodicity of inhomogenous Markov 
chains. 

In what follows we denote the set of all probability transition matrices over the state space S as 
V. The set of all stochastic matrices with identical rows is denoted £; in particular, £ C V . We refer 
to matrices in £ as i.i.d. models because a homogeneous Markov chain with a probability transition 
matrix in this set is just a sequence of independent and identically distributed (i.i.d.) S'-valued 
random variables. 

Broadly speaking, an ergodicity coefficient is any continuous function 7 : V — > [0,1]. Such 
function is said proper when 7(p) = 1 if and only if p £ £ . Clearly, Doeblin's coefficient as defined 
in ^ is proper. Other ergodicity coefficients found in the literature are: 

7i ip) = max minp(i, j); 



def \ 

72O) = mm^2mm{p(i,s),p(j,s)} = 1 - - 'inax^ \p(i,s) -p(j,s)\; 

def 



2 1,3 

73 (p) 1 - max max. \p(i, s) -p(j, s)\. 



Only the last two of these are proper and 72 is called Markov's ergodicity coefficient. 

Ergodicity coefficients have been proposed for a range of purposes such as to analyze the contrac- 
tive property of a stochastic matrix [17j and bound its non-Perron- Froebenius eigenvalues [23] . How- 
ever, they have mostly been used to analyze the asymptotic behavior of non-homogenous Markov 
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chains [6] [T2l [22]. This entails understanding the asymptotic behavior of products of the form 
n£_ m Pk) with m < n, for a given sequence (pk)k>a C V . Such a sequence is said weakly ergodic if 
for all m > and z, j, sGS the following applies: 



lim 



(n»)(M)-(iift)(j» 



= 0. (3) 



k—m k—m 

The following condition, known as Markov's theorem [21], is sufficient for weak ergodicity: 



5^7l(Pfc) = +°°- ( 4 ) 



fc=0 

This condition is only sufficient in great part because 71 is not proper [22j . 

In more probabilistic terms, consider a first-order Markov chain Y — (Yk)k>o with state space 
S such that P[Yfc = s \ Y/.-1, . . . , Y ] = pk(Yk-i,s), for each k > 1. The sequence (pk)k>o is weakly 
ergodic if and only if any two independent realizations of Y meet infinitely often on a same state, 
with probability one. This characterization is due to Doeblin and appeared without proof in the 
report [B] . The way in which Doeblin proved this result is matter of speculation and it was lost with 
his death in World War II (see [22] for the historical developments). Furthermore, Doeblin's report 
remained unnoticed for almost two decades. During this period, the following condition was proved 
to be both necessary and sufficient for weak ergodicity [12] : 

there exists a strictly increasing sequence of positive 

00 rifc+i-l /-r\ 

integers (n fc ) fc > such that: J2 72 ( II Pi) = +°°- 

k=0 i=nk 

In contrast, Doeblin's characterization of weak ergodicity is the following [6]: 
there exists a strictly increasing sequence of positive 

OO n/c + l— 1 (n\ 

integers (rik)k>o such that: a { II Pi) =+00. 

k— i—TLk 

Since 71 (p) <a(p) < 72 (p), for each p£p, the sufficient condition in @ is a special instance of 
the conditions in §5§ and ©. Though nobody knows how Doeblin proved that conditions © and 
© are equivalent, Seneta ventured in [32] a possible proof, that relies on the following two facts, 
valid for any sequence (pk)k>o C V: 



(a) (] 



7a( I! Pk) ) < II (1 - 72«), for all n > 0; 

fc=0 y k=0 



(b) if J] a(pfc) = +00 then J] 7i(pfe) = 

k=Q k=0 



Paper overlook and organization. Our paper is mostly about Doeblin's ergodicity coefficient, 
which we encountered — by accident — while aiming at accurate but low-to-moderate complexity ap- 
proximations of occupancy distributions in homogenous Markov chains. Here we mostly state and 
prove new properties about Doeblin's coefficient which we would have never explored otherwise. The 
more detailed implications of these properties to approximate occupancy distributions will be part 
of a follow up publication based on the M.S. thesis [5]. 

In i|2] we demonstrate new properties about Doeblin's coefficient which allow us to provide a 
new and more elementary proof of Doeblin's characterization of weak-ergodicity (see §2.11) . In fj3j 
we relate Doeblin's coefficient to a decomposition of the chain into several independent realizations 
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of an auxiliary chain. This leads to a (hopefully) refreshing explanation of the strong-ergodicity 
of irreducible and aperiodic Markov chains (see ij3.ip . Furthermore, the decomposition allows us 
to parse (with high probability) the trajectory of a Markov chain of duration n into short-lived 
realizations of an auxiliary chain of duration of order ln(n) (see N3.2p . We exploit this feature to 
propose new approximations for occupancy distributions based on Doeblin's coefficient, which we 
compare against Normal and Poisson approximations in a numerical example. 

2 A candidate for Doeblin's missing proof. 

Recall that Doeblin's ergodicity coefficient associated with a p E V is the quantity: 



Because a(-) is a proper ergodicity coefficient, a(p) is closed to 1 when p is in the proximity of some 
i.i.d. model. However, since the set of i.i.d. models is closed, there should be several i.i.d. models 
close to p. The following result identifies an affine space of i.i.d. models that are in the proximity of 
P- 

Theorem 2.1. For each p £ V , the following applies: 
(a) If < a < a(p) then there is E £ £ and M £ V such that p = a ■ E + (1 — a) ■ M . 



If a(p) < 1 then a(M) = i.e. M has a zero in each column. 
Ifa{p) > then E(i,j) = ^ • minp(s, j). 

Proof. Define (3 — a(p). We first show part (a) in the theorem, for which we may assume without 
loss of generality that (3 > 0. In this case, all reduces to prove that there is E £ £ and M £ V such 
that 



Indeed, if < a < (3 then the above implies that p = aE + (j3 - a)E + (1 - (3)M = aE + (1 - a)Q, 
for some Q £ V, because the matrix (f3 — a)E + (1 — (3)M has nonnegative entries and the sum of 
the entries in each of its rows is (1 — a). To prove the above identity, consider the matrix E £ £ 
with entries E(i,j) = va.m. s p(s,j)/ (3. Since j3E(i,j) < p(i,j), the matrix {p — f3E) has nonnegative 
entries and row sums equal to (1 — f3). In particular, if j3 = 1 then p = E and the above identity 
holds with any choice of M. Otherwise, it suffices to select M — (p — (3E)/(1 — (3). This shows ([7]) 
and completes the proof of part (a) . 

To show part (b), notice that if < a < 1 is such that p = aE + (1 — a)M, with E £ £ and 
M £ Ai, then (3 = a(p) > a ■ a(E) = a. Part (b) is now a direct consequence of part (a). 

Finally we show part (c). Thus assume that E £ £ and M £ V such that p — f3E + (1 — /3)M. 
If P = 1 then p — E and the identity E{i,j) = vain s p{s,j)/f3 is trivial. On the other hand, if 
/3 = then p must have a zero in each column and M = p. Without loss of generality we may 
therefore assume that < j3 < 1. We first show that a(M) = 0. Set a' = a(M). Due to part 
(a), there exists E' £ £ and M' £ V such that p = (3E + (1 - /3)a'E' + (1 - /3)(1 - a')M' . Hence 
/3 = a(p) > ((3 + (1 — (3)a r ) and as a result a' — 0. To complete the proof of the theorem, fix 
j and notice that (3E(i,j) — p(i,j) — (1 — (3)M(i,j). In particular, since M has a zero in each 
column, there is s such that (3E(s,j) = p(s,j). Finally, since (3E(i,j) < p(i,j), we conclude that 
j3E(i,j) = min s p(s,j). This completes the proof. □ 




(b) a{p) = sup {a £ [0,1] | (BE £ £)(BM £ V): p = a-E + (1 - a)-M). 

(c) Assume that E £ £ and M £ V are such that p = a(p) ■ E + (l — a(p)) • M . 



p = [3 ■ E + (1 - 0) ■ M. 



(7) 
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Due to part (a) in the previous theorem, for all p £ V there is E £ £ and M £ V such that: 

(p - E) = (1 - a{p)) ■ (M - E). 

In particular, the smaller (1 — a(p)), the closer p is to an i.i.d. model. According to the following 
result, when one multiplies two or more stochastic matrices, one can only get "closer" to the set of 
i.i.d. models. This is the key ingredient for our proof of Doeblin's characterization of weak ergodicity 
in the following section. 

Corollary 2.2. (l - a(pq)) < (l - a{p)) ■ (1 - a(q)) , for allp,q£V. 

Proof. Define a\ = a(p) and a 2 = a(q). Due to part (a) in Theorem 12.11 there are matrices 
Ei,E 2 £ £ and Mi,M 2 £ V such that p = a^Ex + (1 - a.\)M\ and q = a 2 E 2 + (1 - a 2 )M 2 . In 
particular, since pE 2 — E 2 , we see that pq = a 2 E 2 +a%(l — a 2 )EiM 2 + (1 — ai)(l — a 2 )M\M 2 . But 
notice that E\M 2 £ £. Consequently, the rows of the matrix a 2 E 2 + a\(l — a 2 )E\M 2 are identical, 
with common row sum {a\ + a 2 — a\a 2 ). As a result, there is £3 £ £ such that 

pq — (ai + a 2 — a\a 2 ) ■ E 3 + (1 — Qi)(l — a 2 ) ■ M\M 2 \ 

= (1 - (1 - ai)(l - a 2 )) ■ E 3 + (1 - ai )(l - a 2 ) ■ M X M 2 . 

Finally, due to part (b) in Theorem 12. 11 it follows from the above that 

a(pq) > 1 — (1 - «i)(l - a 2 ), 
which proves the corollary. □ 

2.1 A first principles proof of Doeblin's characterization of weak ergod- 
icity. 

As we mentioned earlier, Doeblin's proof of his own characterization of weak ergodicity is matter of 
speculation. Though it is possible to prove that ([3]) and ([6]) are equivalent using Theorem 1 in [22] 
and Corollary 12. 2\ here we venture an alternative and more elementary proof of this fact. For this 
fix an integer m > and let a n denote the Doeblin's ergodicity coefficient of T[k =m Pk- Due to parts 
(a) and (c) in Theorem 12.11 there are matrices E n £ £ and M n £ V such that: 

n 

Y[ Pk = a n ■ E n + (1 - a n ) ■ M n) with a(M n ) = 0. 

k—m 

In particular, for all s G S the following holds: 

n n 

( Y[pk)(i,s)-{ Y[p k )(j, S ) = (l-a n )-(M n (i, S )-M n (j, S )). (8) 

k—m k—m 

Assume first that condition © holds. Consider the sets of non-negative integers: 

I n = {k I 3j such that m < rij < k < nj + i < n}; 
J n = {j I 3 k £ I n such that rij < k < 

Notice that J n is an interval of integers. Furthermore, there are stochastic matrices L and R n such 
that rifc=mPfe = L ' (Ilfce/ P k ) ' ^ n particular, due to Corollary [221 we nn d that 

n J+ i— 1 

(i-«n)<(i-a(n^))< n (!-«( n *»*))■ 

k€zl n j^Jn k—Tlj 
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Since (1 — x) < exp(— x), for < x < 1, the condition in (|6j) implies that lim a n — 1. Back in (|8j), 

n— yoo 

since each M n is a stochastic matrix, we conclude that 

n n 

n i ™o(n^)( i ' s )-(n^)^ s )=°- 

k—m k—m 

This shows that condition ^ is also satisfied i.e. (pk)k>o is weakly ergodic. 

Conversely, assume that condition (j3)) holds. To show that condition ([6]) also applies, we first 
prove that (a n ) n >o has a subsequence that converges to one. We show this by contradiction. In 
particular, due to the identity in ([5]), it applies that 

lim (M n (i,s) -M n (j,s)) =0, 
n— >oo 

for all s£S. Fix si £ S. Since each M n has at least one zero in the column associated with s\ 
then there is j\ G S and a subsequence (n' k )k>o such that M n > Si) — for all k > 0. Therefore, 
M n ' {i, si) — > as k —t oo, for all i £ S. Now fix S2 6 S \ {si}- Since each has at least one 
zero in the column associated with S2 then there is a subsequence (n' fc ')fc>o of (n' k )k>o such that 
M„»(i,si) — > and M„»(z,S2) — > as A; — > oo, for all i E S. Since 5 is finite, a straightforward 
inductive argument shows that there is a subsequence {rik)k>o such that 

lim M nk {i,s) = 0, 

k— >oo 

for all i, s £ S. However, the above is not possible because each M Uk is a stochastic matrix. As a 
result, (a„)„>o must have a subsequence that converges to one. 

The previous argument shows that if (pk)k>o is weakly ergodic then, for all m > 0, there is 
n > m such that e.g. ct(Y[ 7 k l =m Pk) > 1/2. From this, condition ([6]) is immediate and we have proved 
that conditions and © are equivalent. 



3 Occupancy distributions of homogeneous chains. 

In this section we retake our original motivation of approximating occupancy distributions in finite 
homogeneous Markov chains. For this notice that all the complexity associated with computing or 
approximating occupancy distributions is due to the distributional dependence between consecutive 
transitions of X — (X;);>o. Our next result yields a stochastic equivalent of X based on Doeblin's 
ergodicity coefficient that breaks (at random times) this dependence. To state the result, assume 
that: 

p = a ■ E + (1 — a) ■ M, (9) 
for certain 0<a<l, E€£ and Me?, and denote as e any of the rows of E. 

Theorem 3.1. Assume that condition (0) is satisfied. Imagine a coin that shows E with probability 
a and M with probability (1 — a) when tossed. The stochastic sequence Y — (Yi)i>Q defined as 
follows: 

(i) Yq has distribution fx, and 

(ii) for each i > 0, the distribution of Yi+i conditioned on (Yo, . . . ,Y^) is given by the following 
procedure: toss the coin, and if the E-side comes up then draw Yi + \ using the distribution e(-), 
else draw l^+i using the distribution M(Yi, ■), 

has the same distribution as X. 
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Proof. Due to the definition of the Y process, for each i > and So,... , Sj+i € 5, the following 
applies: 

P(T l+ i = s l+ i | Y a = s , . . . ,Yi = = cce(s i+ i) + (l-a)-M(si,s i+ i), 

= a ■ E(si, s i+ i) + (1 - a) ■ M(si,s i+ i) = p{s t , s i+ i)- 

In particular, Y is a first-order homogeneous Markov chain with initial distribution fi and probability 
transition matrix p. Hence X and Y have the same distribution. □ 

Next, we show how to exploit the random times at which the imaginary coin of the theorem 
breaks the dependence between consecutive transitions of the F-chain. The first application gives a 
non-standard argument for the strong ergodicity of irreducible and aperiodic Markov chains. In the 
second application, we refine this argument to obtain low-to-moderate complexity approximations 
of occupancy distributions. 

For what follows, recall that the total variation distance between two probability distributions 
u(-) and v(-) supported over N = {0, 1, . . .} is defined as: 




where u(i) and v(i) denote u({i}) and v({i}), respectively. Accordingly, the total variation distance 
between two N- valued random variables U and V is defined as the total variation distance of their 
distributions and it is denoted || U — V\\ . 

3.1 Connections with strong ergodicity. 

It is well-known that if p is irreducible and aperiodic then there is a unique stationary distribution 
i.e. a unique probability distribution 7r such that np — tt. In this case, there are constants cq, c\ > 
which depend on p but not on /i, such that: 

\X n - tt|| < c e- Cl - n , for all n > 0. (10) 

In particular, the distribution of X n is asymptotically independent of n. Using Theorem 13. 1[ one 
may alternatively explain this phenomena as follows. If a(p) > then there is a distribution e such 
that, regardless of the state where the chain is located, the next state will be picked up from this 
distribution with probability a(p). Each time this distribution is used, any information about the 
states previously visited by the chain is lost. This distribution acts therefore as a "memory- breaker" . 
When n is large, and even if a(p) is small, it is unlikely that no memory-breaker occurred between Yq 
and Y n . Since all transitions after the last memory-breaker where controlled by M, the distribution 
of Y n should be well-approximated by a mixture of distributions of the form bM . This intuition is 
made precise on the following result. Due to part (b) in Theorem 12. 11 notice that the optimal choice 
for a is a(p). 

Corollary 3.2. J3]/ Assume that condition {PP is satisfied. Let m > and consider S-valued random 
variables Zq , . . . , Z m such that Z t has distribution bM . If I is a random index independent of 
(Z 1 ,...,Z m ) such that F[I = t) = a(l - a)*/(l - (1 - a) m+1 ), for < t < m; in particular, 
P[J G {0, . . . , m}] — 1, then 

H-Xn - Zi\ < (1 - a) m+ \ for all n > m. 
In particular, if a > and p is irreducible and aperiodic then tt = ae (I — (l — a)Af) 1 and 

\\X n - vr || < 2(1 - a)™, for all n>l. 
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To fix ideas, consider the probability transition matrix: 





- 3 





7 




10 


10 


p = 





9 

10 


1 

10 




4 

. 5 


1 
5 






(11) 



Since a(p) — 0, the inequalities of the corollary are trivial. However, observe that: 



" 13 


7 


21 _ 


20 


50 


100 


2 


83 


9 


25 


100 


100 


6 





29 


. 25 


50 


50 . 



31 
100 



E 2 



69 
100 



Mo 



with 



_8_ 

31 
_8_ 
31 
_8_ 

31 



14 
31 
14 
31 
14 
31 



_9_ 

31 
_9_ 
31 
_9_ 

31 



and M 9 



- 19 





4 " 


23 


23 





1 





16 


4 


49 


. 69 


69 


69 . 



Notice that a 2 := a(p 2 ) = 31/100; in particular, the above decomposition of p 2 is a direct 
consequence of part (c) in Theorem 1 2. II Define e 2 as the first row of £2. Imagine you would like to 
approximate the distribution of some X n , with as few matrix multiplications as possible, and within 
a 5% accuracy in total variation distance. Define e := 0.05. Due to the Corollary 13.21 this is possible 
for any even number n > 18, by considering a mixture of the distributions e 2 M 2 , with t = 0, . . . , 8. 
On the other hand, because Markovian kernels are contractive in total variation distance [17], this 
is also possible for any odd number n > 18 by considering a mixture of the distributions e 2 M\p 1 
with t — 0, ... ,8. Either mixture can be computed in at most 10 matrix multiplications, however, 
as seen in Table [TJ this number can be optimized by considering larger powers of p. Indeed, it is 
possible to approximate within e- units the distribution of each X n , with n > 16, using a mixture of 
three distributions associated with Doeblin's ergodicity coefficient of p A . This mixture is given by: 



E 



(l-q 4 ) t -(l-Q 4 ) 

1 - (1 - ai y 



t+i 



e 4 • Ml ■ p n{mod 4) , 



(12) 



which can be computed using 7 matrix multiplications. In retrospect, this is far from obvious. For 
instance, using a computer algebra, one finds that: 



0.3444507000 
0.1966080000 
0.3559832000 



0.3440640000 
0.6114381000 
0.3839078000 



0.3114853000 
0.1919539000 
0.2601090000 



Since m&Xij ||p 7 (i, •) —p 7 (j, •) II — 0.21, the chain is still far from its stationary distribution even after 
7 transitions. Indeed, max^ \\p 7 (i,-) — vr(-)|| > 0.13, exceeding the total variation distance between 
any X n , with n > 16, and the distribution in (|12[) . 



3.2 Approximation of occupancy distributions. 

Assume that condition ^ is satisfied. Following the notation of Theorem 13.11 the occupancy 
distribution of a set T C S is the distribution of the random variable: 

t=i 

The moment generating function (m.g.f.) of T n is given by /1 • {p(z)} n ■ 1, where p(z) is the matrix 
with polynomial entries given by p(z)(i,j) — p(i,j) ■ zH J ' 6T H, and 1 is a column- vector of ones. p(z) 



8 



k 


Oik 


m k 


rik 




1 





oo 


oo 


oo 


2 


0.31 
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18 


10 
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0.403 


5 


18 


8 


4 


0.5287 


3 


16 


7 
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0.63234 


3 


20 


8 


6 


0.758471 


2 


18 


8 


7 


0.857157 


2 


21 
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Table 1: Parameters associated with powers of the probability transition matrix in Here 
a k = a(p k ) and p k = a k E k + (1 — a k )M kl with E k S £ and e 'P. In addition, := 
|~ln(e)/ ln(l — Qfc) — 1] , with e := 0.05, and n k := fc(wfc + 1)- Due to Corollarv l3.2[ if denotes any 
of the rows of E k then, for each n > n k , there exists a mixture of the distributions M k p n ( moA 
t = 0, . . . , mfe, which approximates the distribution of X n within e-units in total variation distance. 
This mixture can be computed with at most c k := (k + m k ) matrix multiplications. 



is called a transfer matrix [10], and the computation of the exact distribution of T n is expensive 
unless n is relatively small. In what follows, we extend the argument of the previous section to 
approximate this distribution. 

Notice that the random variables \Yi 6 T] and [Y, e T], with i < j, are independent when at 
least one of the random variables If+i, . . . , Yj is drawn from the the memory-breaker distribution e. 
In particular, the times at which the -E-side of the coin appears cut the trajectory Yq, . . . , Y n into 
independent "pieces" . The number of such pieces is random, and consecutive transitions in each piece 
are governed by the matrix M. Furthermore, the initial distribution of each piece is e, except for the 
first piece which has initial distribution ji. The expected number of memory-breakers between the 
first and last transition is an] and the average separation between consecutive memory-breakers is 
1 /a, regardless of n. As a result, a mixture of e(z) • {M(z)} m • 1, with m an integer in a neighborhood 
of 1 /a, should lead to a decent approximation of the m.g.f. of the occupancy distribution of T in each 
piece other than the first. For the first piece, the m.g.f.'s to consider are of the form /i • {M(z)} m ■ 1. 
Since the behavior of the Markov chain is independent from one piece to another, an approximation 
for the m.g.f. of T n should follow. More importantly for computations, a power of order o(n) 
of the transfer matrix M(z) should suffice for a decent approximation of the distribution of T n . 
The weakest point of this heuristic is the probable occurrence of longer than expected pieces at 
already intermediate values of n. This motivates us to look at the random variable L n defined as 
the length of the longest piece. (In probabilistic terms, L n is the length of the largest run of M's 
in rt-tosses of the coin from Theorem 13.10 The asymptotic distribution of this random variable is 
well understood, both via combinatorial and probabilistic methods [9l [101 Q] ■ Since the distribution 
of L n concentrates around — ln(cm)/ ln(l — a) as n increases, selecting m — — cln(an)/ ln(l — a), 
for c > 1, gives P[L n < m] = 1 + (D(n 1 ~ c ). An explicit upper-bound for the error in total variation 
distance follows now from the next result. We notice that the m.g.f. of the random variable Wj in 
the corollary can be computed explicitly via a symbolic specification |10) . 

Corollary 3.3. JJC Assume that condition (0) is satisfied. Fix m > and define i n = ¥[L n < m]. 
Let I — (Iq, . . . ,Ik) be a random composition ofn such thatP[I — (io)ii, ■ ■ ■ ,ik)] = ex (1 — a) n ~ k /£ n , 
for all k > 0, < io < m, 1 < i\ < (m + 1), for I > 1, and such that X);=o = n - ^ n addition, 
consider independent random variables U{1), V(i,/) which are independent of I and such that U{1) 
has m.g.f. fx ■ {M(z)} 1 ■ 1 and V(i,l) has m.g.f. e(z) ■ {M(z)} 1 - 1 ■ 1. If one defines Wi := U(I Q ) + 
E?=iV(l,h) then 

||T n -Wz||<^=-^. (13) 
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As a numerical example, we select a stationary and homogenous Markov chain from [8] with 



state space S = {1, . 



, 8} and probability transition matrix 



l-q(i,T) 



6T c ,jeT c ; 
£T c ,jeT; 



(14) 



where 



/ 0.334 


0.215 


0.173 


0.119 


0.065 


0.086 


0.003 


0.005 \ 


0.289 


0.133 


0.211 


0.133 


0.067 


0.156 


0.007 


0.004 


0.356 


0.184 


0.075 


0.043 


0.151 


0.183 


0.002 


0.006 


0.41 


0.162 


0.108 


0.075 


0.14 


0.097 


0.005 


0.003 


0.316 


0.239 


0.044 


0.218 


0.076 


0.098 


0.004 


0.005 


0.44 


0.176 


0.044 


0.242 


0.088 





0.005 


0.005 


0.18 


0.06 


0.19 


0.09 


0.13 


0.1 


0.13 


0.12 


V 0.2 


0.16 


0.07 


0.1 


0.14 


0.1 


0.09 


0.14 ) 



The goal is to approximate the occupancy distribution of the set T = {8} for various values of n 
and /3. The parameter /3 controls transitions to T, which become rare for /3 small. Table [2] gives 
exact total variation distance errors for Normal [10] and compound Poisson approximations [5] as 
well as our approximation in (|13[) . As shown in the table, approximation (|13[) gives one order of 
magnitude or more improvement over the compound Poisson approximation. Furthermore, it is clear 
that n — 1000 may be not large enough for an accurate Normal approximation to the occupancy 
distribution of T. 



n 


p 


Normal Approximation 


Compound Poisson Approximation 


Approximation 


10 


1 


1.7E-2 


3.2E-3 


3.1E-4 


10 


0.5 


1.7E-2 


1.2E-3 


1.3E-4 


10 


0.2 


1.3E-2 


4.9E-4 


5.6E-5 


10 


0.1 


5.3E-3 


1.7E-4 


2.1E-5 


10 


0.01 


5.3E-4 


1.5E-5 


2.1E-6 


10 





5.3E-5 


1.5E-6 


2.1E-7 


100 


1 


0.23 


9.7E-3 


2.3E-4 


100 


0.5 


0.22 


3.5E-3 


1.6E-4 


100 


0.25 


0.14 


1.3E-3 


7.5E-5 


100 


0.1 


2.0E-2 


3.1E-4 


3.1E-5 


100 


0.01 


5.2E-3 


1.6E-5 


3.3E-6 


100 





5.3E-4 


1.5E-6 


3.3E-7 


1000 


1 


6.9E-2 


9.4E-3 


2.1E-5 


1000 


0.5 


9.0E-2 


4.9E-3 


1.4E-5 


1000 


0.25 


0.14 


2.7E-3 


8.2E-6 


1000 


0.1 


0.23 


9.6E-4 


1.1E-5 


1000 


0.01 


2.0E-2 


2.7E-5 


1.8E-6 


1000 





5.2E-3 


1.7E-6 


2.0E-7 



Table 2: Total variation distance for approximations to the occupancy distribution of the set T = {8} 
for the stationary chain described by (|14[) . The compound Poisson approximation, given by [5], is a 
Polya-Aeppli distribution. 
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